在非洲使用的2,000多种语言几乎都没有广泛可用的自动语音识别系统,并且所需的数据也仅适用于几种语言。我们已经尝试了两种技术,这些技术可能为非洲语言提供大型词汇识别的途径:多语言建模和自我监督学习。我们收集了可用的开源数据并收集了15种语言的数据,并使用这些技术训练了实验模型。我们的结果表明,汇总多语言端到端模型中可用的少量数据,并预先培训无监督的数据可以帮助提高许多非洲语言的语音识别质量。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy preserving artificial intelligence and can also be used to augment small datasets. Here we show that diffusion probabilistic models can synthesize high quality medical imaging data, which we show for Magnetic Resonance Images (MRI) and Computed Tomography (CT) images. We provide quantitative measurements of their performance through a reader study with two medical experts who rated the quality of the synthesized images in three categories: Realistic image appearance, anatomical correctness and consistency between slices. Furthermore, we demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (dice score 0.91 vs. 0.95 without vs. with synthetic data).
translated by 谷歌翻译
Language is one of the primary means by which we describe the 3D world around us. While rapid progress has been made in text-to-2D-image synthesis, similar progress in text-to-3D-shape synthesis has been hindered by the lack of paired (text, shape) data. Moreover, extant methods for text-to-shape generation have limited shape diversity and fidelity. We introduce TextCraft, a method to address these limitations by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs for training. TextCraft achieves this by using CLIP and using a multi-resolution approach by first generating in a low-dimensional latent space and then upscaling to a higher resolution, improving the fidelity of the generated shape. To improve shape diversity, we use a discrete latent space which is modelled using a bidirectional transformer conditioned on the interchangeable image-text embedding space induced by CLIP. Moreover, we present a novel variant of classifier-free guidance, which further improves the accuracy-diversity trade-off. Finally, we perform extensive experiments that demonstrate that TextCraft outperforms state-of-the-art baselines.
translated by 谷歌翻译
心脏磁共振(CMR)序列随着时间的推移可视化心脏功能的体素。同时,基于深度学习的可变形图像注册能够估计离散的向量字段,这些矢量字段将CMR序列的一个时间步骤扭曲为以下方式,以一种自我监督的方式。但是,尽管这些3D+T向量领域中包含的信息来源丰富,但标准化的解释具有挑战性,到目前为止,临床应用仍然有限。在这项工作中,我们展示了如何有效使用可变形的矢量场来描述心脏周期的基本动态过程,形式是派生的1D运动描述符。此外,基于收缩或放松心室的预期心血管生理特性,我们定义了一组规则,可以鉴定五个心血管阶段,包括末端 - 末端(ES)和末端diastole(ED),而无需使用标签的使用情况。我们评估了运动描述符在两个具有挑战性的多疾病, - 中心, - 扫描式短轴CMR数据集上的合理性。首先,通过报告定量措施,例如提取相的周期性框架差异。其次,通过定性地比较一般模式,当我们时间重新样本和对齐两个数据集的所有实例的运动描述符时。我们方法的ED,ES密钥阶段的平均周期框架差为0.80 \ pm {0.85} $,$ 0.69 \ pm {0.79} $,比观察者间的可变性略好($ 1.07 \ pm {0.86} $, $ 0.91 \ pm {1.6} $)和监督基线方法($ 1.18 \ pm {1.91} $,$ 1.21 \ pm {1.78} $)。代码和标签将在我们的GitHub存储库中提供。 https://github.com/cardio-ai/cmr-phase-detection
translated by 谷歌翻译
角度分辨光发射光谱(ARPES)技术的最新发展涉及空间分辨样品,同时保持动量空间的高分辨率特征。这种开发很容易扩大数据大小及其复杂性以进行数据分析,其中之一是标记类似的分散剪辑并在空间上绘制它们。在这项工作中,我们证明了代表性学习(自我监督学习)模型的最新发展与K均值聚类相结合可以帮助自动化数据分析的一部分并节省宝贵的时间,尽管表现较低。最后,我们在代表空间中介绍了几次学习(k-nearest邻居或KNN),在该空间中,我们有选择地选择一个(k = 1)每个已知标签的图像参考,随后将其余的数据标记为最接近的参考图片。最后一种方法证明了自我监督的学习的强度,特别是在ARPE中自动化图像分析,并且可以推广到任何涉及图像数据的科学数据分析中。
translated by 谷歌翻译
通过向每个数据示例添加校准的噪声来保护个人的隐私,差异隐私(DP)已成为保护个人隐私的黄金标准。尽管对分类数据的应用很简单,但在图像上下文中的可用性受到限制。与分类数据相反,图像的含义是相邻像素的空间相关性固有的,使噪声的简单应用不可行。可逆的神经网络(INN)表现出了出色的生成性能,同时仍提供量化确切可能性的能力。他们的原理是基于将复杂的分布转换为一个简单的分布,例如图像进入球形高斯。我们假设在旅馆的潜在空间中添加噪音可以实现差异化的私有图像修改。操纵潜在空间会导致修改的图像,同时保留重要的细节。此外,通过对数据集提供的元数据进行调节,我们旨在使对下游任务的尺寸保持重要意义,例如分类未触及的,同时更改其他可能包含识别信息的其他部分。我们称我们的方法意识到差异隐私(CADP)。我们对公共基准测试数据集以及专用医疗进行实验。此外,我们还展示了方法对分类数据的普遍性。源代码可在https://github.com/cardio-ai/cadp上公开获得。
translated by 谷歌翻译
我们提出了ShapeCrafter,这是一个用于递归文本条件3D形状生成的神经网络。生成文本条件的3D形状的现有方法会消耗整个文本提示,以在一个步骤中生成3D形状。但是,人类倾向于递归描述形状,我们可能以初始描述开始,并根据中间结果逐步添加细节。为了捕获此递归过程,我们引入了一种生成以初始短语为条件的3D形状分布的方法,该方法随着添加更多短语而逐渐发展。由于现有的数据集不足以训练这种方法,因此我们提出了Text2Shape ++,这是一个支持递归形状生成的369K形状文本对的大数据集。为了捕获通常用于完善形状描述的本地细节,我们建立在矢量定量的深层隐式函数的基础上,从而产生高质量形状的分布。结果表明,我们的方法可以生成与文本描述一致的形状,并且随着添加更多短语,形状逐渐发展。我们的方法支持形状编辑,外推,并可以在人机合作中为创意设计提供新的应用程序。
translated by 谷歌翻译
我们设计了一个快速的汽车检测和跟踪算法,用于安装在十字路口的Fisheye视频。我们使用ICIP 2020 VIP杯数据集并采用Yolov5作为对象检测基础模型。该数据集的夜间视频非常具有挑战性,基本模型的检测准确性(AP50)约为54%。我们根据框架之间的边界盒传播的概念设计了可靠的汽车检测和跟踪算法,该框架在夜间和白天视频中分别提供了17.9个百分点(PP)和7 pp精度的提高。为了加快加速,灰度框架差用于段中的中间帧,这可以使处理速度加倍。
translated by 谷歌翻译
多对象跟踪(MOT)是一项具有挑战性的任务,涉及检测场景中的对象并通过一系列帧跟踪它们。由于时间阻塞以及一系列图像序列的变化,评估此任务很困难。 Kitti等数据集上基准MOT方法的主要评估度量已成为高阶跟踪准确性(HOTA)度量,该指标能够更好地描述MOTA,DETA和IDF1等指标的性能。点检测和跟踪是一项密切相关的任务,可以将其视为对象检测的特殊情况。但是,评估检测任务本身(点距离与边界框重叠)存在差异。当包括时间维度和多视图方案时,评估任务变得更加复杂。在这项工作中,我们提出了一个多视图高阶跟踪指标(MVHOTA),以确定多点(多企业和多级)检测的准确性,同时考虑到时间和空间关联。 MVHOTA可以解释为检测,关联和对应准确性的几何平均值,从而为每个因素提供相等的权重。我们通过以前有组织的医疗挑战中的公开内窥镜检测数据集证明了用例。此外,我们与此用例的其他调整后的MOT指标进行比较,讨论MVHOTA的属性,并展示提出的对应准确性和闭塞指数如何促进对闭塞处理方法的分析。该代码将公开可用。
translated by 谷歌翻译